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ABSRTACT 

We study the evolution of the network properties of a populated network embedded in a 
genotype space characterised by either a low or a high number of potential links, with par- 
ticular emphasis on the connectivity and clustering. Evolution produces two distinct types 
of network. When a specific genotype is only able to influence a few other genotypes, the 
ecology consists of separate non-interacting clusters in genotype space. When different types 
may influence a large number of other sites, the network becomes one large interconnected 
cluster. The distribution of interaction strengths — but not the number of connections — 
changes significantly with time. We find that the species abundance is only realistic for a 
high level of species connectivity. This suggests that real ecosystems form one intercon- 
nected whole in which selection leads to stronger interactions between the different types. 
Analogies with niche and neutral theory are also considered. 

Keywords: ecosystems; networks; species abundance distribution; neutral and niche the- 
ory; evolution and self-organisation. 
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1 INTRODUCTION 

An important characteristic of an ecosystem is the total set of interactions between the 
various individuals. Organisms may influence each other in many ways and it is difficult to 
monitor and quantify all possible interactions except the most direct, such as simple trophic 
relations. The development of the set of interactions over evolutionary time scales is even 
more difficult to measure because of random mutations and the resulting adaptations. Gain- 
ing an understanding from observations is also problematic since laws may very well only be 
recognisable at the level of averages, see [Loreau and Hector, 200H Yedid and Bell, 2002| . 



Here, we approach these issues within the framework of a simple model of ecosystem assem- 
bly and evolution [Christensen et al., 2002| |Hall et al., 2002} |di CoUobiano et al., 2003 . 



We compare early and late time connectivity and cluster properties of ecosystems evolv- 
ing in two differently connected spaces: genotypes influence either a small or a large number 
of others. Clearly, the actual number of interactions experienced by a site depends on which 
of all the possible mutations and adaptations have occurred, i.e. the network is dependent 
on its history. It turns out that the interaction strengths change significantly with time, 
whilst the degree (number of active interactions) distribution remains close to what would 
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be expected if genotypes were occupied at random. The species abundance curve takes a 
log-normal form only for spaces where the genotypes are linked to many others. Our model 
is neutral in some aspects but also draws on concepts from niche theory. 



2 METHODS 

We briefly describe the structure and dynamics of the Tangled Nature model. Details can 
be found in [Christensen et al., 2002^ [Hall et al., 2002| . An individual is represented by a 
vector S" = (5", 5*2 , in the genotype space S, where the "genes" S'f may take the 
values ±1, i.e. denotes a corner of the L-dimensional hypercube. In the present paper 
we take L = 20. We think of the genotype space S as containing all possible ways of 
combining the genes into genome sequences. Many sequences may not correspond to viable 
organisms. The viability of a genotype is determined by the evolutionary dynamics. All 
possible sequences are made available for evolution to select from. The number of occupied 
sites is referred to as the diversity. 

For simplicity, an individual is removed from the system with a constant probability pkui 
per time step. A time step consists of one annihilation attempt followed by one reproduction 
attempt. One generation consists of N(t)/pkiii time steps, which is the average time taken 
to kill all currently living individuals. All references to time will be in units of generational 
time. 

The ability of an individual to reproduce is controlled by a weight function if(S",t): 

i/(S",t) = -^(5:j(S",S)n(S,t)| -fiN{t), (1) 



cN{t) 



,ses 



where c is a control parameter, N{t) is the total number of individuals at time t, the sum is 
over the 2^ locations in S and n{S, t) is the number of individuals (or occupancy) at position 
S. Two positions S" and S'' in genome space are coupled with the fixed random strength 
jab _ jj^ga^ Qb^ which cau bc either positive, negative or zero. This link is non-zero with 
probability 9. There is no self-interaction, so J"" = 0. The present paper compares the three 
cases 6 = ^ = 2^ ^ ~ \- '^^^ non-zero values of J"'' 7^ J^^ are determined by a 
deterministic but rapidly varying function of the two positions and |Hall et al., 2002| . 

The conditions of the physical environment are simplistically described by the term 
fiN(t) in equation ([T}, where /i determines the average sustainable total population size, i.e. 
the carrying capacity of the environment. An increase in fi corresponds to harsher physical 
conditions. Notice that genotypes only adapt to each other and the physical environment 
represented by fi. We use asexual reproduction consisting of one individual being replaced 
by two copies. Successful reproduction occurs with a probability per unit time given by 

„ exp[i7(S-,t)] 

'^^ - l + exp[if(S-,t)] ^ 

We allow for mutations in the following way: with probability Pmut per gene we perform a 
change of sign —3°", during reproduction. 

Initially, we place iV(0) = 500 individuals at randomly chosen positions. Their initial 
location in genotype space does not affect the dynamics. A two-phase switching dynamic is 
seen consisting of long periods of relatively stable configurations (quasi-Evolutionary Stable 
Strategies or q-ESSs) interrupted by brief spells of reorganisation of occupancy which are 
terminated when a new q-ESS is found, as discussed in [Christensen et al., 2002| . 
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3 RESULTS 



a Parameters 

As mentioned previously, the only parameter that is changed in this paper is the connec- 
tivity, 9. The values used throughout are L = 20, c = 0.005, /i = 0.01, pkui = 0.2 and 
Pmut = 0.015. This selection provides several transitions between different q-ESSs for each 
run, see [Hall et al., 2"002l |di CoUobiano et al., 2003| . We consider three 6 values: 0.001, 
0.005 and 0.25, which we will refer to as very low, low and high 6 respectively. These 
correspond to below, near and above the percolation threshold. That is, the point where 
there is a non-zero probability that all living sites are connected in one dominant cluster 
[Albert and Barabasi, 2002| . A realistic species abundance curve was only obtained above 
the threshold. We will be contrasting results at t = 500 (primal time), t = 5 000 (early 
time) and t = 500 000 (late time). Early time is well outside the system's initial transient 
search for a quasi-stable configuration in genotype space. The low and high 6 ensembles 
consist of 500 realisations. Each run uses a different random number seed but, for any given 
run, only 6 is changed between the two ensembles. 

Each figure in the paper, apart from figure EJ shows two sets of data: one labelled 
simulation (which are the results generated by the dynamics of the model) and the other 
random. In the random case, rather than evolving the network, for any specified time we 
read in the diversity and number of individuals alive in the simulated run. The individuals 
are then thrown on to the network of 2^° genotypes at random with the constraint that the 
diversity is the same as the simulation. Thus, random data is not dependent on the history 
of the network, but has the same global properties (diversity and population size) as the 
simulations. This provides a very useful null model. Comparisons with this procedure will 
reveal whether the network is really evolving, or the results are just by-products of increasing 
diversity. Simulated data is always shown as a dotted line and random as a continuous line. 

b Connectivity 

We study the temporal evolution of the network connectivity in the space of occupied 
positions for different 9 values. Note that the hard-wired configuration of couplings ^(8**, S'') 
between all 2^° positions in genotype space is determined at t = and remains constant. 
The network of occupied sites will nevertheless change with time. The degree distributions 
in figure ^ show the number of genotypes having x active interactions. 

The leftmost pair of curves represents primal time, the next, early time and the rightmost 
late time. Considering only the simulation data for now, a clear shift to a greater number of 
active links is seen in the high 6 case, whilst a slight change occurs for low 6. The difference 
between early and late time is bigger than that between early and primal time. The degree 
of a site is equal to the number of direct interactions it has with all other occupied sites. 
This explains why any particular site in the low 6 runs only has at most nine and usually 
only one or two direct interactions. The data is integrated over each ensemble. How much 
of this shift is due to a genuine change in network connectivity? For high 6, the null model 
data shows that there is very little difference between evolving the network and throwing 
down individuals randomly. Low 6 appears to show a change. However, any site that does 
not interact with any others will die very quickly in a simulation. If for any instant in time 
genotype positions are chosen by chance, such a low connectivity will give a disproportionate 
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number of isolated genomes that would be forbidden by the dynamics. There is no fair way 
to simulate this effect, but it can be seen that the differences between the time curves in the 
random and simulated runs is similar and thus the network connectivity does not evolve for 
either value of 9. 



c Interaction strength 

For both 6 values, the diversity gradually increases with time. What is causing this? It turns 
out that the strength of the interaction between sites is crucial to the ability of the network 
to support larger numbers of individuals. Figure |21 shows the distribution of interaction 
strengths between each living site and all other living sites at a given time. Interaction 
strengths are assigned at random and are not necessarily symmetric Hall et al., 2002| . For 



example, J(Sl^ S^^) = 0.3, but 7(8^^ S^^) = -0.2. For all times and both values of 6, the 
distribution for the random data is a sharply pointed, symmetrical curve peaking at J = 0. 
This makes sense because there is no bias in the ratio of positive to negative links when the 
links are assigned to the "bare" network at t = 0. 

For reasons of clarity, the simulation results are only shown for primal and late time. 
Clearly, a significant change takes place for high 9 between t = and primal time. Some 
weight is taken in the fall of the peak at J = and the drop in negative J values, and 
redistributed into positive strengths i.e. the curve shifts right. This comprises a significant 
shift in the probability density. The move from primal to late time is smaller — but 
still noticeable — since the large number of reorganisations of genotype space in the early 
generations drops to occasional punctuation of q-ESSs later in the run. (Typically, there 
are only one or two transitions from early time onwards.) Despite this, the curve continues 
to drift to the right. On first inspection, the low 9 runs seem to have changed dramatically 
from the initial configuration. However, nothing particularly interesting is happening here: 
it is simply an effect of the structure of clusters in the low 9 space and is explained below. 



d Clustering 

The indirect connectivity or clustering (how sites are linked to each other through other 
sites) is another useful network measure. For high 9 we find that at any given time, all 
occupied sites in genotype space belong to one and the same cluster. Thus the cluster size 
for high 9 simply follows the diversity. In contrast, we observe the formation of distinct 
clusters for low 9. The rest of this section will deal solely with low 9. 

The overall structure of the clusters does not seem to change much with time. This is 
the first indication that the clustering is not an evolving property of the network. As would 
be expected, one-clusters are transient. They are born on a new site as mutants from a 
parent but are isolated from other sites and so are extremely unlikely to reproduce. (Since 
11 = 0.01 and the average total population is about 2700, Poff ~ 10~^^ <C Pkui = 0.2 so an 
isolated site is much more likely to be killed than multiply when chosen.) These sites are 
simply flashing in and out of existence. 

Simulations indicate that the building blocks of larger groups are two-clusters. These 
tend to be two very old sites that have mutually positive links. Large clusters are formed 
mainly from very old two-clusters joined together by a mutant. The continual background of 
mutants flitting in and out of the network plugs these building blocks together. However, the 
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entire cluster is rarely long lived, whereas the two-clusters are formed early in any particular 
run, quickly build up their population and are very persistent since their occupation is high. 

Clusters do indeed generally increase in size with time. There are, however, large fluc- 
tuations in the record size which gives an indication of how unstable these large clusters 
are, see figure El The largest recorded cluster in any run at any time contained 281 sites. 
It is revealing to compare the results from the null model, where the maximum cluster 
size is much smaller but just as variable. When individuals are thrown down at random, 
two-clusters are no longer the building blocks of the large clusters and any long string of 
connected sites is determined purely by chance and hence the biggest cluster will always be 
smaller than that produced by the dynamics. The temporary nature of the large clusters is 
further borne out by time and ensemble averaging the cluster sizes. The number of clusters 
of a particular size S is stored at intervals of 5000 generations from early to late time for 
each of the 500 runs and the time and ensemble average is then calculated. As expected, 
there are many one-clusters and fewer large clusters. 

The distribution follows the functional form Us ~ s-5/2g-(p-Pc) s anticipated for the 
cluster size distribution on a random graph of D 195) nodes. (See equation (36) in 
[Albert and Barabasi, 2002| .) The percolation threshold is very close to the consid- 
ered connectivity 6 = 0.005. A comparison with the random data shows that this scale- 
independent distribution is not due to the dynamics of the system, the only difference being 
the appearance of larger record clusters in the simulation, as shown in figure |H1 This is 
perhaps the most compelling piece of evidence that the low 6 regime does not show any 
emergent structure. We also ran simulations for very low 6 (0.001) and found that the 
cluster sizes were exponentially distributed as would be expected below the percolation 
threshold. 

Figure |2l (low 6) can now be easily explained. Unconnected sites die extremely quickly 
in the first few generations leaving behind two-clusters and other sites with positive inter- 
actions. The slight increase in sites with J > for late time is caused by well established 
positive-positive two-clusters. So what looked like an interesting result initially proves to 
be due to the fairly constant microscopic structure of the network. 

e Species abundance 

The Species Abundance Distribution or SAD is important in characterising ecosystems. It 
is the proportion of species that contain p individuals. We define a species as one site 
in genotype space. Ideally, we would like to use a coarse-grained definition more likely 
to reflect real ecologies, where species are defined as groups of points in genotype space 
echoing the genotypic cluster species definition introduced by Mallet [Mallet, 1995| . Since 
the maximum number of genotypes in our model is only around 10^ anyway, the single 
site species approach is more appropriate. We have been able to extend the initial results 
obtained in [Hall et al., 2002| and can consider the evolution of the SAD for high and low 
9 integrated across all 500 runs, as seen in figure jU The larger ensembles allow enough 
statistics for illuminating conclusions to be drawn. Note that the null model is absent since 
when individuals are sprinkled randomly across the living sites, there is no tendency for 
accumulation on any particular site, so the individuals follow a multinomial distribution. 

The key result of this paper is that only high 6 leads to a SAD similar to those observed 
in nature. Low 6 is skewed by its heavily populated two-clusters. The plots for high 6 
show the log-normal form observed in many real ecosystems and in other ecological models. 
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see [McKane et al., 2000 Hubbell, 200 1| . They appear to become more log- normal as time 



increases with the dip between four and eight individuals falling, even though the diversity 
is rising. Hence, the SAD is evolving. From this, it seems that the high 9 case structures 
itself more like a real ecosystem than low 9, whose SAD develops a sharper peak as the 
two-clusters become densely occupied. The single cluster of highly interdependent genomes 
produces a reasonable SAD that cannot be formed by patches of isolated clusters. 

Thus the abstract parameter 9, which cannot be measured in a simple way in real sys- 
tems, is directly linked to the easily observed SAD. We recall that low values of 9 correspond 
to a world in which different species, or types, are able to influence only a small number of 
other species. High values of 9 correspond to the situation where different types may have 
an impact on the vitality of a large number of other species. 

The initial descent in both curves from the global peak at p = 1 is due to the large 
number of sites with only one occupant. In nature, sampling difficulties would mean that 
these sites would not be detected so this first aspect is not seen in observed SADs. (It is 
particularly marked for our model since we use each site as one species and do not coarse- 
grain.) But the second peak does correspond well to results from the field, though it should 
be pointed out that the proportion of all sites with more than two individuals is only about 
30% in each case. However, this is sufficient to detect the evolution of the SAD. 

We note that a recent study of a simplified version of the Tangled Nature model by 
Rikvold and Zia Rikvold and Zia, 2003, .Rikvold and Zia,"| found no temporal evolution of 



the statistics of the model. The reason for this may be that they use a relatively short genome 
length L = 13 together with a very substantial simulation time of order 10^ generations. We 
have observed previously that the time to reach a stationary state explodes with genome 
length [Christensen et al., 2002| . 

f Neutral and niche theory 

There has recently been much interest in the neutral theory of biodiversity [Hubbell, 2001| , 



Bell, 200T1 . Despite making assumptions that are anathema to traditional niche models (all 



individuals are the same and adaptations to specific environmental niches are essentially 
unimportant) it has been successful in making predictions about real world ecology — 
although its effectiveness in modelling the species abundance has recently been called into 



question McGill, 2003 



At t = 0, the Tangled Nature model is neutral as all individuals are the same. How- 
ever, the dynamics immediately breaks this neutrality as configurations are spontaneously 
generated. Once individuals become differentiated, interactions matter and the evolution is 
better described by niche-theory — although fi and pkui remain neutral. Unlike the models 
in [Hubbell, 200 1| , we have no spatial aspect: we deal with only one large metacommunity 
as opposed to many local communities aggregating to form the metacommunity. 

The only measure in our model considered by the neutral theory is the species abundance, 
which we find takes an approximate log-normal form. The shape predicted by neutrality, 
the zero-sum multinomial (ZSM) distribution, is quite similar to the log-normal except that 



the ZSM has a long right tail and is much harder to calculate McGill, 2003 . To detect 
whether our distributions are closer to the ZSM or the log-normal would be computationally 
prohibitive as the genome length L would need to be much larger. Perhaps in the future, 
models like Tangled Nature could be used to investigate the relative importance of niche 
and neutral effects. 
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Discussion 



Our most important results are that temporal evolution of the network properties of an 
ecosystem and a realistic form for the species abundance are only seen if the genotype space 
is well connected. This is interpreted here as meaning that an occupied genotype is likely 
to interact with many other (potentially occupied) genotypes. No evolution at the level of 
ecosystems can occur in a world where most possible genotypes are inert, i.e. whether they 
are present or not will have very little influence on other organisms. It is easy to overlook 
the importance of the entire network of interactions when dealing with small communities of 
organisms on a macroscopic scale, but easier to visualise with colonies of billions of bacteria. 

We suggest that this observation can be used to gain insight into the potential underlying 
connectivity between biota. Imagine two microbial evolution experiments. In one case, the 
microbial ecosystem evolves towards an interwoven or entangled ecology. In the other, little 
evolution is observed in the structure of the ecological properties of the microbial community. 
One might, according to the result from our model, anticipate that the first system consists 
of microbes from a part of genotype space in which types influence each other, whereas 
the second system consists of genotypes from a region of space consisting of mainly inert 
organisms. 

From our results, it is tempting to speculate that the observed degree of diversity, com- 
plexity and adaptation of living matter may be directly related to a high level of interde- 
pendence between organisms. Thus Darwin's entangled bank may be a useful image to keep 
in mind when studying the evolution of large collections of individuals. 

We are grateful to Gunnar Pruessner for extensive help with the development of the 
parallelised code that enabled us to probe the model at much deeper timescales and for 
broader ensembles than had been possible before. We wish to thank Andy Thomas for his 
fantastic technical support. Without his help and dedication, this project would not have 
been possible. We are indebted to Dan Moore, Brendan Maguire and Phil Mayers for their 
continuous support. We thank Albert Diaz-Guilera for inspiring discussions and Ed Johnson 
for reading the manuscript. Paul Anderson thanks EPSRC for research funding. 
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Figure 1: Top: Degree histogram for 6 = 0.005. Bottom: 6 = 0.25. Solid lines, random; 
dotted lines, simulation. From the left, the pairs of curves are for t=500, 5000 and 500000. 
At later times, the number of active links increases for both the simulation and random 
data. 
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Figure 2: Top: Distribution of interaction strengths between individuals for 9 = 0.005. 
Bottom: 9 = 0.25. Inset: Entire distribution. Sohd hues, random; crosses, simulation at 
t=500; dotted lines, simulation at t=500000. All plots are normahsed so that their area is 
one. For high 9, a significant increase in positive interactions is seen. For low 9, a change is 
seen but for trivial reasons. 
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Figure 3: Maximum cluster size across all realisations for 9 = 0.005. Solid line, random; 
dotted line, simulation. Clusters produced by the simulation are larger than those produced 
in a history-independent network 
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Figure 4: Species abundance functions for the simulations only. Dashed line, t=500; dashed- 
dotted hne, t=5000; solid hne, t=500000. Low 9 on the left, high 9 on the right. The 
ecologically realistic log- normal form is only seen for high 9. 
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